SearCas 2 Dataset梦寻

SearCas 2 Dataset梦寻

四月 09, 2019

cover

investigate datasets in dehazing

Overall

Performance analysis for flow and stereo is addressed from two main directions:

  • First, synthetic images can be created for benchmarks (e.g. with computer graphics);
  • Second, reference results for real images are measured with specialized hardware.
  • (A third alternative is to not generate ground truth at all and leave the benchmarking to experts.)

Creating synthetic ground truth by simulation

  • It is relatively straightforward to generate flow and depth ground truth while allowing for systematic variations in all scene parameters such as material properties, light sources as well as animations
  • The first widely recognized synthetic dataset with a benchmarking website was MPI-Sintel which used existing assets from a Blender movie to generate a dataset termed naturalistic, addressing the fact that the data looks a bit more like a cartoon than the real world, but still resembles real images to some degree.
  • While some preliminary statistical results indicate that synthetic data can be used as ground truth, these methods have not been thoroughly evaluated with respect to their representativeness.

Creating ground truth by measurement

  • the challenge lies in creating reference data which is accurate enough.

  • measurement:

    • manual measurements:

      Although the accuracy is good compared to reference data from the Middlebury flow benchmark, annotations are not measurements. Possible biases introduced by humans have yet to be investigated.

    • (Semi-)automatic measurement:

      • have a human in the loop to correct algorithm results.
      • more reliable, but only work in restricted scenarios.
      • using more than two cameras, or additional modalities such as structured light, LIDAR or UV-paint with multiple exposures and light sources or others.
      • not as costly as completely manual processing.
      • they still are prone to outliers and biases caused by measurement devices and human corrections.
      • it is the only currently known method for large-scale, outdoor stereo and flow ground truth generation.
      • To deal with such uncertainties during benchmarking, several approaches have been developed.

总的来说,ground truth的创建主要有两种方式:

  • 通过游戏引擎或动画模拟生成人造数据集和相应的非常精确的ground truth;
  • 利用人工或计算机算法根据获取自然数据集使用的硬件信息生成有一定误差的ground truth。

mainly from The HCI Benchmark Suite: Stereo And Flow Ground Truth With Uncertainties for Urban Autonomous Driving

Datasets

Virtual

Scene Flow

Source

CVPR2016

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Keywords

>39000 stereo 960x540 synthetic accurate disparity

Features

  • 使用Blender制作的3D合成数据集,主要用于optical flow预测;
  • 给定了 intrinsic camera parameters (focal length, principal point)和settings (image size, virtual sensor size and format);
  • Depth is directly retrieved from a pixel’s 3D position and converted to disparity using the known configuration of the virtual stereo rig.
  • We rendered all image data using a virtual focal length of 35mm on a 32mm wide simulated sensor. For the
    Driving dataset we added a wide-angle version using a focal length of 15mm which is visually closer to the existing KITTI datasets.
  • We train a network for disparity estimation, which yields competitive performance also on previous benchmarks, especially among those methods that run in real-time.

Driving Subset

类KITTI2015设计

The Driving scene is a mostly naturalistic, dynamic street scene from the viewpoint of a driving car, made to resemble the KITTI datasets. It uses car models from the same pool as the FlyingThings3D dataset and additionally employs highly detailed tree models from 3D Warehouse11 and simple street lights. In Fig. 4 we show selected frames from Driving and lookalike frames from KITTI 2015. Our stereo baseline is set to 1 Blender unit, which together with a typical car model width of roughly 2 units is comparable to KITTI’s setting (54cm baseline, 186cm car width [8]).

Comparison

sceneflow

Usage

  • finetune
  • train

Virtual KITTI

Source

CVPR2016

Virtual Worlds as Proxy for Multi-Object Tracking Analysis

Keywords

21,260 frames monocular 1242x375 synthetic accurate depth multi weather conditions

Features

  • 35 photo-realistic synthetic videos (5 cloned from the original real-world KITTI tracking benchmark, coupled with 7 variations each) for a total of approximately 17,000 high resolution frames, all with automatic accurate ground truth;
  • Created using the Unity game engine and a novel real-to-virtual cloning method;
  • As the gap between real and virtual worlds is small, virtual worlds enable measuring the impact of various weather and imaging conditions on recognition performance, all other things being equal.

Comparison

like KITTI

Usage

  • train
  • test(fog part)

Real

Cityscapes

Source

CVPR2016

The Cityscapes Dataset for Semantic Urban Scene Understanding

The Cityscapes Dataset

Keywords

stereo 1024x2048

Features

  • a new large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames.
  • depth maps which calculated from SGM, are with large holes and artifacts, need to be fixed.

Usage

  • train
  • test

KITTI

Source

IJRR2013

Vision meets robotics: The KITTI dataset

CVPR2015

Object Scene Flow for Autonomous Vehicles

CVPR2012

Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

KITTI数据集简介与使用

Keywords

stereo coarse indeed 1392x512 non-synthetic

Features

  • KITTI2012:consists of 194 training image pairs and 195 test image pairs, saved in loss less png format.

  • KITTI2015:consists of 200 training scenes and 200 test scenes (4 color images per scene, saved in loss less png format). Compared to the stereo 2012 and flow 2012 benchmarks, it comprises dynamic scenes for which the ground truth has been established in a semi-automatic process.

  • KITTI数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办,是目前国际上最大的自动驾驶场景下的计算机视觉算法评测数据集。KITTI包含市区、乡村和高速公路等场景采集的真实图像数据,每张图像中最多达15辆车和30个行人,还有各种程度的遮挡与截断。整个数据集由389对立体图像和光流图,39.2 km视觉测距序列以及超过200k 3D标注物体的图像组成,以10Hz的频率采样及同步。总体上看,原始数据集被分类为’Road’, ’City’, ’Residential’, ’Campus’ 和 ’Person’。原始数据采集于2011年的5天,共有180GB数据。对于3D物体检测,label细分为car, van, truck, pedestrian, pedestrian(sitting), cyclist, tram以及misc组成。

  • 对于立体图像和光流(stereo and optical flow),依据disparity 和end-point error计算得到平均错误像素数目(average number of erroneous pixels)。

  • 考虑移出Calibration和Person部分。

  • ground truth不够精准:

    their depth information is less precise and incomplete compared to indoor datasets.For example, due to the limitations of RGBbased depth cameras, … and the KITTI dataset has at least 7 meters of average error.

    from Benchmarking Single Image Dehazing and Beyond

    Such methods perform even worse outdoors, with at least 4 meters of average error on Make3D and KITTI datasets.

    from Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image

    KITTI has a sparser and less accurate point cloud (±2 cm according to the manufacturer) which is densified using a semi-automatic ICP step.

    Uncertainties (R10) are not available in KITTI, but according to the authors, most disparities are around 3 px accurate with some flow vectors containing relative errors around 5% of the magnitude.

    from The HCI Benchmark Suite: Stereo And Flow Ground Truth With Uncertainties for Urban Autonomous Driving

    stereo部分ground truth是根据camera calibration利用具体方法生成的,为了避免误差没有利用插值补全disparity,导致最终disparity密度约为50%。

    To obtain a high stereo and optical flow ground truth density, we register a set of consecutive frames (5 before
    and 5 after the frame of interest) using ICP. We project the accumulated point clouds onto the image and automatically remove points falling outside the image. We then manually remove all ambiguous image regions such as windows and fences. Given the camera calibration, the corresponding disparity maps are readily computed. Optical flow fields are obtained by projecting the 3D points into the next frame. For both tasks we evaluate both non-occluded pixels as well as all pixels for which ground truth is available. Our nonoccluded evaluation excludes all surface points falling outside the image plane. Points occluded by objects within the same image could not be reliably estimated in a fully automatic manner due to the properties of the laser scanner. To avoid artificial errors, we do not interpolate the ground truth disparity maps and optical flow fields, leading to a ∼ 50% average ground truth density.

    from Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

    在最终和ground truth比较评估时会利用background interpolation生成致密的disparity:

    Missing disparities are filled-in for each algorithm using background interpolation [23] to produce dense disparity maps which can then be compared.

    from Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite

Usage

  • finetune(real part)
  • train

WildDash

Source

ECCV2018

WildDash - Creating Hazard-Aware Benchmarks

WildDash Benchmark

Keywords

test dataset small multi scenarios

Features

  • a new test dataset for semantic and instance segmentation for the automotive domain
  • to improve the expressiveness of performance evaluation for computer vision algorithms in regard to their robustness for driving scenarios under real-world conditions.
  • a variety of data sources from all over the world with many different difficult scenarios (e.g. rain, road coverage, darkness, overexposure) and camera characteristics (noise, compression artifacts, distortion).
  • Risk clusters:

Risk clusters

Usage

  • test

HD1K

Source

CVPR2016

The HCI Benchmark Suite: Stereo And Flow Ground Truth With Uncertainties for Urban Autonomous Driving

HD1K Benchmark Suite

Keywords

28504 stereo multi weather conditions 2560x1080

Features

  • > 1000 frames at 2560x1080 with diverse lighting and weather scenarios
  • reference data with error bars for optical flow
  • evaluation masks for dynamic objects
  • specific robustness evaluation on challenging scenes
  • covers previously unavailable, challenging situations such as low light or rain and comes with pixel-wise uncertainties.
  • a small fraction (we estimate much fewer than 0.1% of all pixels) of our ground truth contain wrong values, mainly due to current technological limits in LIDAR.

Comparison

details in 4.2. Dataset Comparison

comp

Usage

  • train
  • finetune
  • test

Others

datasets reasons
Make3D outdoor but not driving and old
NYUv2 indoor
Middlebury indoor
ETH3D indoor & buildings but cool
MPI-Sintel anime and cool
NTU RGB+D indoor and motion